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We consider the estimation of integrated covariance (ICV) matri- 
ces of high dimensional diffusion processes based on high frequency 
observations. We start by studying the most commonly used esti- 
mator, the realized covanance (RCV) matrix. We show that in the 
high dimensional case when the dimension p and the observation fre- 
quency n grow in the same rate, the limiting spectral distribution 

pL^ I (LSD) of RCV depends on the covolatility process not only through 

the targeting ICV, but also on how the covolatility process varies in 
time. We establish a Marcenko-Pastur type theorem for weighted 
sample covariance matrices, based on which we obtain a Marcenko- 
Pastur type theorem for RCV for a class C of diffusion processes. 
'^ ' The results explicitly demonstrate how the time variability of the 

covolatility process affects the LSD of RCV. We further propose 

^^ ■ an alternative estimator, the time-variation adjusted realized covari- 

^ ' ance (TVARCV) matrix. We show that for processes in class C, the 

Cn ■ TVARCV possesses the desirable property that its LSD depends 

>0 ■ solely on that of the targeting ICV through the Marcenko-Pastur 

equation, and hence, in particular, the TVARCV can be used to re- 
cover the empirical spectral distribution of the ICV by using existing 

l/^ ■ algorithms. 

o: 

1. Introduction. 

1.1. Background. Diffusion processes are widely used to model financial 

asset price processes. For example, suppose that we have multiple stocks, 

C^ i say, p stocks whose price processes are denoted by Sl for j = 1,. . . ,p, and 

X^' := \ogSl are the log price processes. Let Xj = {X^: ' ,. . . ^X^y . Then 
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2 X. ZHENG AND Y. LI 

a widely used model for X^ is [see, e.g., Definition 1 in Barndorff-Nielsen 
and Shephard (2004)] 

(1.1) dXt = titdt + etdWt, 

where, /i^ = (/ij , . . . ,/iJ^ )^ is a p-dimensional drift process; Qt is a p x p 
matrix for any t, and is called the (instantaneous) covolatility process; and W^ 
is a p-dimensional standard Brownian motion. 
The integrated covariance (ICV) matrix 

Sp:= / Otefdt 
Jo 

is of great interest in financial applications, which in the one dimensional 
case is known as the integrated volatility. A widely used estimator of the ICV 
matrix is the so-called realized covariance (RCV) matrix, which is defined as 

follows. Assume that we can observe the processes X^ s at high frequency 
synchronously, say, at time points Tn^f- 

Xg\{=logSgl), ^ = 0,l,...,n,i = l,...,p, 
then the RCV matrix is defined as 



ERCV:=^AX,(AX 
(1.2) 



where AX/ 









In the one dimensional case, the RCV matrix reduces to the realized volatil- 
ity. Thanks to its nice convergence to the ICV matrix as the observation 
frequency n goes to infinity [see Jacod and Protter (1998)], the RCV matrix 
is highly appreciated in both academic research and practical applications. 

Remark 1 . The tick- by-tick data are usually not observed synchronous- 
ly, and moreover are contaminated by market microstructure noise. On 
sparsely sampled data (e.g., 5-minute data for some highly liquid assets, 
or subsample from data synchronized by refresh times [Barndorff-Nielsen 
et al. (2011)]), the theory in this paper should be readily applicable, just 
as one can use the realized volatility based on sparsely sampled data to 
estimate the integrated volatility; see, for example, Andersen et al. (2001). 

1.2. Large dimensional random, matrix theory (LDRMT). Having a good 
estimate of the ICV matrix Sp, in particular, its spectrum (i.e., its set of 
eigenvalues {Aj : j = 1, . . . ,p}), is crucial in many applications such as prin- 
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cipal component analysis and portfolio optimization (see, e.g., the pioneer 
work of Markowitz (1952, 1959) and a more recent work [Bai, Liu and Wong 
(2009)]). When the dimension p is high, it is more convenient to study, in- 
stead of the p eigenvalues {Aj : j = 1, . . . ,p}, the associated empirical spectral 
distribution (ESD) 

F^p(x) :=-#{j:Aj<x}, xGM. 

A naive estimator of the spectrum of the ICV matrix Tip is the spectrum 
of the RCV matrix S^ . In particular, one wishes that the ESD F p 
of Sp would approximate F p well when the frequency n is sufficiently 
high. From the large dimensional random matrix theory (LDRMT), we now 
understand quite well that in the high dimensional setting this good wish 
won't come true. For example, in the simplest case when the drift pro- 
cess is 0, covolatility process is constant, and observation times r„,/ are 
equally spaced, namely, Tn/ = i/n, we are in the setting of estimating the 
usual covariance matrix using the sample covariance matrix, given n i.i.d. p- 
dimensional observations (AX^)^=i^...^„. From LDRMT, we know that \ip/n 
converges to a non-zero number and the ESD F ^ of the true covariance 
matrix converges, then the ESD F p of the sample covariance matrix 
also converges; see, for example, Marcenko and Pastur (1967), Yin (1986), 
Silverstein and Bai (1995) and Silverstein (1995). The relationship between 
the lim,iting spectral distribution (LSD) of S?-*-'^ in this case and the LSD 
of Sp can be described by a Marcenko-Pastur equation through Stieltjes 
transforms, as follows. 

Proposition 1 [Theorem 1.1 of Silverstein (1995)]. Assume on a com- 
mon probability space: 

(i) for p= 1,2,... and for l<£<n, llf = (^i^'^^)i<i<p with z|^'^'^ 
i.i.d. with mean and variance 1; 

(ii) n = n{p) with yn := p/n — )• y > as p —)• oo; 

(iii) T,p is a (possibly random) nonnegative definite p X p matrix such 
that its ESD F p converges almost surely in distribution to a probability 
distribution H on [0, oo) asp— t-oo; 

(iv) T,p and Zif 's are independent. 

Let Tjp be the (nonnegative) square root matrix of Tp and Sp := 1/n x 

X]"=i ^p "^i C^f ) Tp . Then, almost surely, the ESD of Sp converges in 
distribution to a probability distribution F, which is determined by H in that 
its Stieltjes transform 

m,F{z):= dF{X), z G C+ := {z G C :Im(z) > 0} 

JxeR X — z 
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solves the equation 



(1.3) mF(z) = / —- T- ^^rr dHir). 

In the special case when Sp = a'^Ipxp, where Ipxp is the p x p identity 
matrix, the LSD F can be exphcitly expressed as foUows. 

Proposition 2 [see, e.g.. Theorem 2.5 in Bai (1999)]. Suppose that Tjf' 's 
are as in the previous proposition, and Sp = a'^Ipxp for some a^ > 0. Then 
the LSD F has density 

P{x) = -^ — 2 — \/ {h - x){x - a) ifa<x<b, 

and a point mass 1 — 1/y at the origin if y > 1, where 

(1.4) a = aiy)=a\l-^f and b = b{y) = a\l + ^f. 

The LSD F in this proposition is cahed the Marcenko-Pastur law with 
ratio index y and scale index a"^, and will be denoted by MP*-^'"" •* in this 
article. 

1.3. Back to the stochastic volatility case. In practice, the covolatility 
process is typically not constant. For example, it is commonly observed that 
the stock intraday volatility tends to be U-shaped [see, e.g., Admati and 
Pfleiderer (1988), Andersen and Bollerslev (1997)] or exhibits some other 
patterns [see, e.g., Andersen and Bollerslev (1998)]. In this article, we shall 
allow them to be not only varying in time but also stochastic. Furthermore, 
we shall allow the observation times Tn/ to be random. These generalizations 
make our study to be different in nature from the LDRMT: in LDRMT the 
observations are i.i.d.; in our setting, the observations (AX^)^=i^...^„ may, 
first, be dependant with each other, and second, have different distributions 
because (i) the covolatility process may vary over time, and (ii) the obser- 
vation durations Arg := Tn/ — t^z-i may be different. 

In general, for any time- varying covolatility process 0t, we associate it 
with a constant covolatility process given by the square root of the ICV 
matrix 



(1.5) e° := W / QsQ'^ds for all t G [0, 1]. 

Let X^ be defined by replacing Gt with the constant covolatility process ©j 
(and replacing fit with 0, and Wt with another independent Brownian mo- 
tion, if necessary) in (1.1). Observe that Xj and X^ share the same ICV 
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matrix at time 1. Based on X^, we have an associated RCV matrix 

n 

(1.6) EfV« = ^Ax°(AXOf, 

which is estimating the same ICV matrix as S?" . 

Since S?" and S^ are based on the same estimation method and 
share the same targeting ICV matrix, it is desirable that their ESDs have 
similar properties. In particular, based on the results in LDRMT and the dis- 
cussion about constant covolatility case in Section 1.2, we have the following 
property for S?" : if the ESD F^ converges, then so does F p ; more- 
over, their limits are related to each other via the Marcenko-Pastur equa- 
tion (1.3). Does this property also hold for E?" ? Our first result (Proposi- 
tion 3) shows that even in the most ideal case when the covolatility process 
has the form Qt = 7f Ipxp for some deterministic (scalar) function 74, such 

convergence results may not hold for S^ . In particular, the limit of F p 
(when it exists) changes according to how the covolatility process evolves 
over time. 

This leads to the following natural and interesting question: how does 
the LSD of RCV matrix depend on the time-variability of the covolatility 
process? Answering this question in a general context without putting any 
structural assumption on the covolatility process seems to be rather chal- 
lenging, if not impossible. For a class C (see Section 2) of processes, we 
do establish a result for RCV matrices that's analogous to the Marcenko- 
Pastur theorem (see Proposition 5), which demonstrates clearly how the 
time-variability of the covolatility process affects the LSD of RCV matrix. 
Proposition 5 is proved based on Theorem 1, which is a Marcenko-Pastur 
type theorem for weighted sample covariance matrices. These results, in prin- 
ciple, allow one to recover the LSD of ICV matrix based on that of RCV 
matrix. 

Estimating high dimensional ICV matrices based on high frequency data 
has only recently started to gain attention. See, for example, Wang and Zou 
(2010); Tao et al. (2011) who made use of data over long time horizons by 
proposing a method incorporating low-frequency dynamics; and Fan, Li and 
Yu (2011) who studied the estimation of ICV matrices for portfolio alloca- 
tion under gross exposure constraint. In Wang and Zou (2010), under spar- 
sity assumptions on the ICV matrix, banding/thresholding was innovatively 
used to construct consistent estimators of the ICV matrix in the spectral 
norm sense. In particular, when the sparsity assumptions are satisfied, their 
estimators share the same LSD as the ICV matrix. It remains an open ques- 
tion that when the sparsity assumptions are not satisfied, whether one can 
still make good inference about the spectrum of ICV matrix. For processes 
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in class C (see Section 2), whose ICV matrices do not need to be sparse, 
we propose a new estimator, the time-variation adjusted realized covariance 
(TVARCV) matrix. We show that the TVARCV matrix has the desirable 
property that its LSD exists provided that the LSD of ICV matrix exists, 
and furthermore, the two LSDs are related to each other via the Marcenko- 
Pastur equation (1.3) (see Theorem 2). Therefore, the TVARCV matrix can 
be used, for example, to recover the LSD of ICV matrix by inverting the 
Marcenko-Pastur equation using existing algorithms. 

The rest of the paper is organized as the following: theoretical results are 
presented in Section 2, proofs are given in Section 3, simulation studies in 
Section 4, and conclusion and discussions in Section 5. 

Notation. For any matrix A, \\A\\ = y'X^naxiAA*) denotes its spectral 
norm. For any Hermitian matrix A, F stands for its ESD. For two ma- 
trices A and B, we write A < B (A> B, resp.) if B — A (A — B, resp.) is 
a nonnegative definite matrix. For any interval / C [0,oo), and any metric 
space S, D{I;S) stands for the space of cadlag functions from / to S. Ad- 
ditionally, i = \/— T stands for the imaginary unit, and for any z £ C, we 
write Re(z),Im(z) as its real part and imaginary part, respectively, and z as 
its complex conjugate. We also denote M+ = {a e M : a > 0}, C+ = {z £ C: 
Re{z) > 0} and Qi = {z£C: Re{z) > 0, Im(z) > 0}. We follow the custom of 
writing f ^ g to mean that the ratio f/g converges to 1. Finally, through- 
out the paper, c, C, Ci,C" etc. denote generic constants whose values may 
change from line to line. 

2. Main results. 

2.1. Dependance of the LSD of RCV matrix on the time-variability of co- 
volatility process. Proposition 1 asserts that the ESD of sample covariance 
matrix converges to a limiting distribution which is uniquely determined by 
the LSD of the underlying covariance matrix. Unfortunately, Proposition 1 
does not apply to our case, since the observations AX^ under our general 
diffusion process setting are not i.i.d. Proposition 3 below shows that even 
in the following most ideal case, the RCV matrix does not have the desired 
convergence property. 

(p) 
Proposition 3. Suppose that for all p, X^ = XJ is a p-dimensional 

process satisfying 

(2.1) dKt = ltdWt, tG[0,l], 

where 74 > is a nonrandom (scalar) cadlag process. Let a^ = /„ 7^ dt, and 
so that the LCV matrix Sp is cr^Lpxp. Assume further that the observation 
times Tn£ are equally spaced, that is, Tni = ^/n, and that the RCV ma- 
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trix Sp is defined by (1-2). Then so long as 74 is not constant on [0,1), 
for any e > 0, there exists yc = ydl, e) > such that if limp/n = y >yc, 

(2.2) limsupF^p {b{y) + a"^ e) < 1 almost surely. 

In particular, F p does not converge to the Marcenko-Pastur law MP^y^'' ' . 

Observe that MP^^''^ ' is the LSD of RCV matrix when 7^ = a. The main 
message of Proposition 3 is that, the LSD of RCV matrix depends on the 
whole covolatihty process not only through Sj,, hut also on how the covolatil- 
ity process varies in time. It wih also be clear from the proof of Proposition 3 
(Section 3.2) that, the more "volatile" the covolatility process is, the fur- 
ther away the LSD is from the Marcenko-Pastur law MP^^'"^ '. This is also 
illustrated in the simulation study in Section 4. 

2.2. The class C . To understand the behavior of the ESD of RCV matrix 
more clearly, we next focus on a special class of diffusion processes for which 
we (i) establish a Marcenko-Pastur type theorem for RCV matrices; and (ii) 
propose an alternative estimator of ICV matrix. 

Definition 1. Suppose that Xt is a p-dimensional process satisfying (1.1) 
and Qt is cadlag. We say that Xf belongs to class C if, almost surely, there 
exist (7i) G D([0,1];M) and A a p x p matrix satisfying tr(AA ) =p such 
that 

(2.3) e* = 7iA. 

Observe that if (2.3) holds, then the ICV matrix Sp = J^ 7I dt ■ AA"^. We 
note that A does not need to be sparse, hence neither does Sp. 

A special case is when A = Ipxp- This type of process is studied in Propo- 
sition 3 and in the simulation studies in Section 4. 

A more interesting case is the following. 

Proposition 4. Suppose that X^' satisfy 

(2.4) dX^^^ = ^P^ dt + 4^UWi^^\ j = 1, . . . ,p, 

where fil ,al G -D([0, 1]; M) are the drift and volatility processes for stock j, 

and Wj 's are (one- dimensional) standard Brownian motions. If the follow- 
ing conditions hold: 

(i) the correlation matrix process of {W^ ) 
p.5) ^, .. ((iE!!l^) ..,„„.,^^^^^^ 

is constant in t £ (0, 1]; 
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(ii) A^'^^ / for all 1 <j,k < p; and 
(iii) the correlation matrix process of {X^ ) 



rz — 7~\ : — 7r\ / "" ^f^ )i<j,k<p 

is constant in t £ (0, 1]; 

then {Xj: ) belongs to class C. 

The proof is given in the supplementary article [Zheng and Li (2011)]. 
Equation (2.4) is another common way of representing multi-dimensional 

(i) 
log-price processes. We note that if X^ are log price processes, then over 

short time period, say, one day, it is reasonable to assume that the correlation 

structure of [X^ ) does not change, hence by this proposition, {X^ ) belongs 

to class C. 

Observe that if a diffusion process Xj belongs to class C, the drift process 

/Xj = 0, and Tn/s and 7t are independent of Wt, then 



AX^ = / ' 7iA dWt = J' 1? dt ■ Si/2 . z,, 

where "=" stands for "equal in distribution," T}''^ is the nonnegative square 
root matrix of S := AA-^, and Z^ = (Z^ , • . . , Zj^ j' consists of independent 
standard normals. Therefore the RCV matrix 



5.RCV = J^ AX,(AX,)^ ^ 5;< . eV2z,(Z,)^sV2^ 



where w'^ = /^"'* I'tdt. This is similar to the Sp in Proposition 1, ex- 
cept that here the "weights" u;" may vary in £, while in Proposition 1 the 
"weights" are constantly 1/n. Motivated by this observation we develop the 
following Marcenko-Pastur type theorems for weighted sample covariance 
matrices and RCV matrices. 

2.3. Marcenko-Pastur type theorems for weighted sample covariance ma- 
trices and RCV matrices. 

Theorem 1. Suppose that assumptions (ii) and (iv) in Proposition 1 
hold. Assume further that: 

(A.i') for p = 1,2,... and 1 < £ < n, llf = {Z^t^\<j^p with Z^^'^^ 
i.i.d. with mean 0, variance 1 and finite moments of all orders; 



(2.7) lim V / \nw^-Ws\ds = 



n l(e~l)/n 
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(A.iii') Tip is a (possibly random) nonnegative definite p x p matrix such 

that its BSD F^p converges almost surely in distribution to a probability 

distribution H on [0,oo) as p^ oo; moreover, H has a finite second moment; 

(A.v) the weights w", 1 <£ <n,n = 1,2, ... , are all positive, and there 

exists K <oo such that the rescaled weights (nw^) satisfy 

max max (nw^) < k; 

n £=l,...,n 

moreover, almost surely, there exists a process Wg G -^([0, 1];K+) such that 

e/n 

l<e<n' 

(A.vi) there exists a sequence r]p = o{p) and a sequence of index sets Xp 
satisfying 2p C {1, . . . ,p} and #Ip < rjp such that for all n and all i, w;" may 

depend on Z^ but only on {Z^ ■^ :j G Zp}; 

(A.vii) there exist C < oo and 6 < 1/6 such that for all p, \\Tp\\ < Cp^ 
almost surely. 

Define Sp = J2^=iW^ ' '^p Z^ (Z^ )-^Sp . Then, almost surely, the BSD 
of Sp converges in distribution to a probability distribution F'^ , which is 
determined by H and (wg) in that its Stieltjes transform mp-^iz) is given by 

(2.8) mF^^z) = --f \ dH{T), 

z y^eK tM{z) + 1 

where M[z), together with another function m{z), uniquely solve the follow- 
ing equation in C+ x C^: 

1 f^ Wg , 

■ds, 




(2.9) J ^ Jo l + ym{z)ws 

Remark 2. Assumption (A.i') can undoubtedly be weakened, for exam- 
ple, by using the truncation and centralization technique as in Silverstein and 
Bai (1995) and Silverstein (1995); or, a closer look at the proof of Theorem 1 

indicates that as long as Z^'' has finite moments up to order k > 6/(1 — 65), 
the theorem is true and can be proved by exactly the same argument. 

Remark 3. If w^ = 1/n, then Ws = l, and Theorem 1 reduces to Propo- 
sition 1. Moreover, if Wg is not constant, that is, Wg ^ jn Wtdt on [0, 1), then 
except in the trivial case when H is a delta measure at 0, the LSD F"^ ^ F, 
where F is the LSD in Proposition 1 determined by H{-/L Wfdt). See the 
supplementary article [Zheng and Li (2011)] for more details. 

Theorem 1 is proved in Section 3.3. 
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A direct consequence of this theorem and Lemma 1 below is the fohowing 
Marcenko-Pastur type result for RCV matrices for diffusion processes in 
class C. We note that, thanks to Lemma 1 below (see the remark after the 
proof of Lemma 1 for more explanations), regarding the drift process, except 
requiring them to be uniformly bounded, we put no additional assumption 
on them: they can be, for example, stochastic, cadlag and dependant with 
each other. Furthermore, we allow for dependence between the covolatility 
process and the underlying Brownian motion — in other words, we allow for 

the leverage effect. In the special case when 7^ does not change in p, is 
nonrandom and bounded, and the observation times are equally spaced, the 
(rather technical) assumptions (B.iii) and (B.iv) below are trivially satis- 
fied. 

(p) 
Proposition 5. Suppose that for allp, X^ is a p- dimensional process 

in class C for some drift process fif = [fif' ,.. . ,/iJ^'^ )'^, covolatility pro- 
cess Qf = 'y^ K^P' and p- dimensional Brownian motion Wj^ = (W^ \- ■ ■ ■, 
W^'^ )'^ . Suppose further that: 

(B.i) there exists Co < 00 such that for allp and all j = 1, . . . ,p, |/ij^'"' | < 
Co for all t G [0, 1) almost surely; 

(B.ii) Tip = AS'P' {K^^')'-^ satisfies assumption (A.iii') and (A.vii) in The- 
orem 1; 

(B.iii) there exists a sequence r]p = o{p) and a sequence of index sets Ip 

satisfying ZpC {I,. . . ,p} and #Xp < rjp such that j^' may depend on 'W^' 

but only on {W^'.j SXp}; moreover, there exists Ci < 00 such that for 

all p, \%^ \ < Ci for all t G [0, 1) almost surely; additionally, almost surely, 
there exists (7t) G L'([0, 1];M) such that 



lim / |7;P)-7j|dt = 0; 
P Jo 



'0 

(B.iv) the observation times Tn/ are independent ofX-t; moreover, there 
exists K <oo such that the observation durations Atu/ '■= Tn/ — Tn/-i satisfy 

max max (n ■ At„ g) < k; 

additionally, almost surely, there exists a process Vs G C([0, 1);M+) such that 

'''n,[ns] -^'^s '■= / ^r d^ as n — )• oo for all < s <1, 
Jo 

where for any x, [x] stands for its integer part. 

Then, as p—)- 00, F p converges almost surely to a probability distribu- 
tion F^ as specified in Theorem 1 for Wg = (7Ts)^^^s- 
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Proposition 5 demonstrates explicitly how the LSD of RCV matrix de- 
pends on the time-variability of the covolatihty process. Hence, the RCV 
matrix by itself cannot be used to make robust inference for the ESD F ^ 
of the ICV matrix. If (7^) [and hence Wg = (7Ts)^^s] is known, then in 
principle, the equations (2.8) and (2.9) can be used to recover F p. How- 
ever, in general, (7^) is unknown and estimating the process (7^) can be 
challenging and will bring in more complication in the inference. Moreover, 
the equations (2.8) and (2.9) are different from and more complicated than 
the classical Marcenko-Pastur equation (1.3), and in order to recover F ^ 
based on these equations, one has to extend existing algorithms [El Karoui 
(2008), Mestre (2008) and Bai, Chen and Yao (2010) etc.] which are de- 
signed for (1.3). Developing such an algorithm is of course of great interest, 
but we shall not pursue this in the present article. We shall instead propose 
an alternative estimator which overcomes these difficulties. 

2.4. Time-variation adjusted realized covariance (TVARCV) matrix. Sup- 
pose that a diffusion process X^ belongs to class C. We define the time- 
variation adjusted realized covariance (TVARCV) matrix as follows: 

,,.,, y _tr(S^ " AX,(AX,)^ tr(SHCV) ^ 

1=1 ' ^' ^ 

where for any vector v, |v| stands for its Euclidean norm, and 

^ p ,^AX,(AX,)^ 

=1 



P'"' ^--„^ lAX, 



Let us first explain Sp. Consider the simplest case when fj,t = 0, jt de- 
terministic. At = Ipxp, and Tn/ = i/n,i = 0, 1, . . . ,n. In this case, AX^ = 

l[e-i)/n^tdt ■ ^i/V^ where Z, = (zf \ . . . .Z.^^^)^ and Z^'^'s are i.i.d. 
standard normal. Hence, AX£(AXf)^/|AXf p = Z£(Z£)^/|Z^p. However, as 

p— )-oo, |Z£p~p, hence Sp ~ 1/n- ^"^-^ Z£(Zf)'^, the latter being the usual 
sample covariance matrix. We will show that, first, tr(Sp ) ~ tr(Sp); and 
second, if Xt belongs to class C and satisfies certain additional assump- 
tions, then the LSD of Sp is related to that of Sp via the Marcenko-Pastur 
equation (1.3), where 

(2.12) 5i^ = _|_s, = AA'-. 

Hence, the LSD of Sp is also related to that of Sp via the same Marcenko- 
Pastur equation. 

We now state our assumptions. Observe that about the drift process, 
again, except requiring them to be uniformly bounded, we put no addi- 
tional assumption. Furthermore, we allow for the dependence between the 
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covolatility process and the underlying Brownian motion, namely, the lever- 
age effect. 

Assumptions: 

(C.i) there exists Co < oo such that for allp and allj = l,...,p, \fif'\ < 
Co for all t £ [0, 1) almost surely; 

(C.ii) there exist constants Ci < oo, <Si < 1/2, a sequence rjp < Cip^ 
and a sequence of index sets Ip satisfying Ip C {1, . . . ,p} and #Xp < r]p such 

that 7]" may depend on Wj but only on {W^' :j G Ip}', moreover, there 

exists C2 < 00 such that for all p, {j^ \ E (1/C2,C2) for all t G [0, 1) almost 
surely; 

(C.iii) there exists C3 < 00 such that for all p and for all j, the individual 

volatilities ai^'^ = ^(t?^)^ • ELi(^5fc )^ ^ (I/C3, C3) for all t £ [0, 1] almost 
surely; 

(Civ) limp_j.ootr(Sp)/p (=limp^oo/o {it I' dt) := 9 > almost surely; 

(C.v) almost surely, as p— )• cxd, the ESD F'^p converges to a probability 
distribution H on [0,oo); 

(C.vi) there exist C5 < 00 and < ^2 < 1/2 such that for all p, ||Sp|| < 
C^p^ almost surely; 

(C.vii) the 61 in (C.ii) and 82 in (C.vi) satisfy that 5i + 62 < 1/2; 
(C.viii) p/n — )• y € (0, 00) as p — )■ 00; and 

(C.ix) there exists C4 < 00 such that for all n, 

max n • {tu/ — Tn/-i) < C4 almost surely; 

l<l<n 

moreover, t^/s are independent of Xj. 

We have the following convergence theorem regarding the ESD of our pro- 
posed estimator TVARCV matrix Tip. 

ip) 
Theorem 2. Suppose that for all p, X^ = XJ is a p- dimensional pro- 
cess in class C for some drift process iif = {^xf' ,.. . ,fif'^)'^, covolatility 

process 0j = 7^ A^^^ and p-dimensional Brownian motion Wj , which 
satisfy assumptions (C.i)^ (C.vii) above. Suppose also that p and n sat- 
isfy (C.viii), and the observation times satisfy (C.ix). Let Sp be as in (2.10). 

Then, as p — ?• 00, F^p converges almost surely to a probability distribu- 
tion F, which is determined by H through Stieltjes transforms via the same 
Marcenko-Pastur equation (1.3) as in Proposition 1. 

The proof of Theorem 2 is given in Section 3.4. 

The LSD H of the targeting ICV matrix is in general not the same as the 
LSD F, but can be recovered from F based on equation (1.3). In practice. 
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when one has only finite number of samples, the articles [El Karoui (2008), 
Mestre (2008) and Bai, Chen and Yao (2010) etc.] studied the estimation 
of the population spectral distribution based on the sample covariance ma- 
trices. In particular, applying Theorem 2 of El Karoui (2008) to our case 
yields. 

Corollary 1. Let Hp = F^p, and define Hp as in Theorem 2 of El Ka- 
roui (2008). If\\Tip\\ are hounded inp, then, asp^ oo, Hp — >■ H almost surely. 

Therefore, when the dimension p is large, based on the ESD of TVARCV 
matrix Sp, we can estimate the spectrum of underlying ICV matrix T,p well. 

3. Proofs. 

3.1. Preliminaries. We collect some either elementary or well-known facts 
in the following. The proofs are given in the supplemental article [Zheng and 
Li (2011)]. 

Lemma 1. Suppose that for each p, v^ = {v^' ,... , f^^ ) and w^ = 
{wf' ,.. . ,wf'^),£ = 1, . . . ,n, are all p-dimensional vectors. Define 

n n 

S^ = Y^(yf)+^f)).(^f+^f)f and 5„ = ^wf(w(^))^. 
1=1 (.=1 

If the following conditions are satisfied: 

(i) n = n{p) with Wuip^ooP/n = y > 0; 

(ii) there exists a sequence Sp = o{l/y/p) such that for all p and all £, all 

(p) 
the entries of v^ are bounded by £p in absolute value; 

(iii) limsupp_j.o^tr(5n)/p < oo almost surely. 

Then L{F^'^,F^") — )■ almost surely, where for any two probability distri- 
bution functions F and G, L{F, G) denotes the Levy distance between them. 

Lemma 2 [Lemma 2.6 of Silverstein and Bai (1995)]. Let z G C with 
V = Im(z) > 0, ^ and B be p x p with B Hermitian, and q G C. Then 

\tv{{{B - zl)-'^ - {B + Tqq* - zl)-'^) ■ A)\ < \\A\\/v for allrGR. 

The following two lemmas are similar to Lemma 2.3 in Silverstein (1995). 

Lemma 3. Let w €C with Re(w;) > 0, and A be an Hermitian nonneg- 
ative definite matrix. Then \\{wA + 1)~^\\ < 1. 

Lemma 4. Let wi,W2 G C with Ke{wi) > and Ke{w2) >0, A be a pxp 
Hermitian nonnegative definite matrix, B any pxp matrix, and q G C^. 
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Then: 

(i) I ti{B{{wiA + /)-! - {W2A + /)-i))| <p-\wi- W2\ ■ \\B\\ ■ \\A\\; 
(ii) \q*B{wiA + I)~^ci-(i*B{w2A + iy^q\ < \wi - W2\ ■ |q|^||-B|| • ||^||. 

Lemma 5. For any Hermitian matrix A and z G C with Im(2:) = v>0, 

\\{A-ziy^\\ <i/v. 

Both Lemmas 3 and 4 require the real part of w (or wi, W2) to be nonneg- 
ative. In our proof of Theorem 1, the requirements will be fulfilled thanks 
to the following lemma. 

Lemma 6. Let z = iv £C with v>0,Abeapxp Hermitian nonnegative 
definite matrix, q G C^, r > 0. Then 
1 1 



z l + rq*(^-z/)~lq 



eQi = {zeC: Re{z) > 0, Im(z) > 0}. 



Lemma 7. Let z = iv £C with v > 0, A be any p x p matrix, and B be 
apxp Hermitian nonnegative definite matrix. Thentr{A{B — zI)~^A*) G Qi. 

Lemma 8. Suppose that Wg G D([0, 1);M+). Then for any y G C, the 
equation 

/ :; ds = y 

Jq l + ZWs 

admits at most one solution in Qi . 

The following result is an immediate consequence of Lemma 2.7 of Bai 
and Silverstein (1998). 

Lemma 9. For X= {X^^\. .. ,X^p'>)'^ where X^^'^ 's are i.i.d. random 
variables such that E'X^ = 0,S|X(^)p = I, and E\X'~^^^'' < 00 for some 
2 < fc G N, there exists Ck > 0, depending only on k, E\X'--^>\'^ and E\X'--^^\'^'^ , 
such that for any p x p nonrandom matrix A, 

E\X*AX - tr(A)|2'= < Cfc(tr(AA*))'= < CkP^'WAf^. 

Proposition 6 [Theorem 2 of Geronimo and Hill (2003)]. Suppose 
that Pfi are real probability measures with Stieltjes transforms mn{z). Let 
K C C+ be an infinite set with a limit point in C+. // limm„(z) := m{z) 
exists for all z G K , then there exists a probability measure P with Stieljes 
transform m[z) if and only if 

(3.1) Mva. iv ■ m{iv) = —1, 

in which case Pn^- P in distribution. 
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3.2. Proof of Proposition 3. By assumption, 7^ is positive and non-constant 
on [0, 1), and is cadlag, in particular, right-continuous; moreover, Jq 7^ dt = cr^. 
Hence, there exists 5 > and [c, d] C [0, 1] such that 

7t>o-(l+'^) for all tG [c,d]. 

Therefore, if [(£ - l)/n,l/n] C [c,d], 

'(£-l)/n 

where Z^ = (Z) , . . . , Z/ y consists of independent standard normals. Hence, 



J(£-l)/n "^ 

Tai = (Z^ , • . . , Z^ y consists of indepei 
if we let J„ = {^ : [{I - l)/n, l/n] C [c, d]} and 



^eJ„ ^ ^ '^ teJ„ 

then for any a; > 0, by Weyl's Monotonicity theorem [see, e.g., Corollary 4.3.3 
in Horn and Johnson (1990)], 

(3.2) F^p°''(x) < F^^{x) < F^^{x/[{1 + 5f{d-c)\). 

Now note that #Jn '-^ {d — c)n, hence if p/n — )• y, by Proposition 2, F p 
will converge almost surely to the Marcenko-Pastur law with ratio index 
y' — y/{d — c) and scale index cr^, which has density on [a{y'),b{y')] with 
functions a(-) and b{-) defined by (1.4). By the formula of &(•), 



(1 + 6)\d - c)h{y') = (1 + 5) • (t\1 + 5){y + 2^ {d -c)y + d- c). 
Hence, for any e > 0, there exists yc> such that for all y > yc, 

(1 + sf{d - c)h{y') > (1 + <5) • a\{l + ^f + e) = {l + 5){h{y) + a\) 
that is, 

{Ky) + a'e) ^b{y') 



{l + 5Y{d-c) -1 + 5' 
By (3.2), when the above inequality holds, 

lim sup F^p^" (6(y) + a'^e) < Mp(^''"') {b{y')/{l + 6))<l. 

3.3. Proof of Theorem 1. To prove Theorem 1, following the strategies in 
Marcenko and Pastur (1967), Silverstein (1995), Silverstein and Bai (1995), 
we will work with Stieltjes transforms. 

Proof of Theorem 1. For notational ease, we shall sometimes omit 
the sub/superscripts p and n in the arguments below: thus, we write Z^ 
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instead of T^f\ we instead of zu", S instead of Sp, S instead of Sp, etc. Also 
recall that y„ =p/t^^ which converges to y > 0. 

By assumption (A.vi) we may, without loss of generality, assume that the 
weights wi are independent of Z^'s. This is because, if we let Z^ be the 
result of replacing Zf' ,j G Xp, with independent random variables with 
the same distribution that are also independent of we, and 5:= '^^=iWe- 
S^'^Z^(Z^) S"^'^, then rank(S' — S) < 2?/p, and so by the rank inequality 

WT^A T-iBii rank(A — B) 

\\F — F \\^ for any A,Bpxp symmetric matrices 

P 

[see, e.g.. Lemma 2.2 in Bai (1999)], 5 and 5 must have the same LSD. 

We proceed according to whether H is a delta measure at or not. If H 
is a delta measure at 0, we claim that F'^ is also a delta measure at 0, and 
the conclusion of the theorem holds. The reason is as follows. By assump- 
tion (A.v), 

n n 

S = ^we- Si/2Z,(Z,)^S V2 < ^ ^ si/2z,(Z,)^Si/2 := k5. 

Hence by Weyl's Monotonicity theorem again, for any a; > 

F^{x)>F^{x/k). 

However, it follows easily from Proposition 1 that F converges to the delta 
measure at 0, hence so does F^ . 

Below we assume that H is not a delta measure at 0. 

Let / = Ipxp be the p x p identity matrix, and 

tr((S-z/)-i) 

m„ := m„(z) = 

p 

be the Stieltjes transform of F^ . By Proposition 6, in order to show that F^ 
converges, it suffices to prove that for all z = iv with t> > sufficiently large, 
lim„mn(-z) '■= rn{z) exists, and that m{z) satisfies condition (3.1). 

We first show the convergence of mn{z) for z = iv with v > sufficiently 
large. Since for all n, |r?T,.„(2:)| < 1/v, it suffices to show that {mn{z)} has at 
most one limit. 

For notational ease, we denote by re = S^' ^Z^. We first show that 

(3.3) max ||r^| /p — h\= max |Z^ SZ^/p — /i| — )• almost surely, 

£=l,...,n i=l,...,n 

where h = L xdH{x). In fact, by Lemma 9 and assumptions (A.i') 
and (A.vii), for any A; G N, 

E\ZjT.Ze - tr(S)p < CkP^p^^^ for all 1 < ^ < n. 
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Using Markov's inequality we get that for any e > 0, 

P(|ZjSZ,-tr(S)|>p.)<^|i^ = -^ foralll<£<n. 

Hence, choosing k > 2/(1 — 25), using Borel-Cantehi and that n = 0{p) yield 
(3.4) max jZ^ SZ^/p — tr(S)/p| — t- almost surely. 

(.=!,. ..,n 



The convergence 
Next, let 


(3.3) 


follows. 


(3.5) 




Mn 


= M, 


.{z)=- 


where 











n 

-T- 



W£ 



^ l + «;^rj(5(f) -zl) ^Ti 

%) •= X] ^J^J^J = S - wererj. 

Note that by Lemma 6, for any i, 

(3-6) -77- TT^ n^^^^i- 

We shall show that 

(3.7) — tr(— zM„S — zl)~ — rrin — )■ almost surely. 
P 

Observe the following identity: for any p x p matrix B, q G M^ and r G C 

for which B and B + rqq are both invertible, 

(3.8) q^(i? + rqq^)-l = ^ ^ .qT^^i^ q^^'^ 
see equation (2.2) in Silverstein and Bai (1995). Writing 

n 

S-zI- {-zMrj: -zl) = ^ wirirj - {-zMnT.), 
taking the inverse, using (3.8) and the definition (3.5) of M„ yield 

(-zM„S - ziy^ -{s- zi)-^ 

= i-zMn^ - ziy' I Y^ wererj - {-zMrX) J {S - zl)~^ 

1 ^ 
= -7 E 1 ^ T,r n^^^-^ + /)-^r,rJ(S(,) - ./)-^ 



1 " 



li;^ 



^ l + w^r| (5(£) -z/) ir£ 



— {MnT. + ir^i:{S - zI)-\ 
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Taking trace and dividing by p we get 
1 



P 



■tT{-zMnS-zI)-^ 



m„ 



-E- 



wi 



^ l + wiYJ {Sf^t) - zl) ^r£ 



-^■df,, 



where 



di = -(tr((M„S + /)-iS(5 - zl)-^) - rJiS^e) - zI)-\Mn^ + ly^r^). 

By (5.2) in the proof of Lemma 6 in the supplementary article [Zheng and 
Li (2011)], Re(rf (S'(^) - ziy^Yf)) > 0. Hence, 

l + WirJ{S(^e)- ziy'^Ti 
Therefore in order to show (3.7), by assumption (A.v), it suffices to prove 
(3.10) max |(i^|— )-0 almost surely. 

£=l,...,n 

Define 

zfr^l + wjrj 



j¥^ 



(%^)-2:I) ir/ 



where <S'(j^£) := X^j-^,- £ w^jfirf = S — {wjVjrJ + wcr^rj). Observe that for ev- 
ery i, M(£) is independent of Z^. 

Claim 1. For any z = iv with v > and any e < 1/2 — 5, 
(3.11) max p^|M(£)(z) — M„(z)| — )• almost surely. 

£=l,...,n 



Proof. Define 



mn{z) 



tr(Si/2(5_^/)-isV2) 



which belongs to Qi by Lemma 7, and 



(3.12) 



1 



Mn = Mn{z) = --Y,Y 



Wi 



i=i 



+ ynnwjmn{z)' 






Wi 



nwjmn{z) 



Mn{z) + - 



We 



z l + ynnwimn{z)' 
Then by a similar argument for (3.9) and using assumption (A.v), 



max \Mu^{z) -Mn{z)\< — . 
;=!,.. .,n ^ ' nv 
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Hence, it suffices to show that 

p^lMniz) - M„(z)| -^ and 
(3.13) _ 

max p^|Mo)(z) — M(i\{z)\ — )• almost surely. 

We shall only prove the second convergence. In fact, 
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M(^)(z)-M(^)(z) 



1 






Wj ■ VnnWj 



^ ^7^ (1 + WjYJ (Sq^i) - Zl) ^Tj) ■ (1 + ynnwjvriniz)) 



• Cj,i 



'3,t' 



where 



Cj,£ = mn{z) 



^Ji^ij/)-^^) ^^3 



P 



Since for all j, 



Wj ■ VnnWj 



[1 + WjTJ (5(j^^) - Zl) ^Tj) ■ (1 + ynmUjlTlniz)) 

it suffices to show that 



< 



n 



(3.14) 



max maxp'^ICj^^l — )• almost surely. 



To prove this, recall that r^ = S^'^Z^, by Lemma 9 and the independence 
between Ze and S^/^^S'q-^^) - z/)-iS^/^ for any A; G N, 

E\zJj:'/\S^,,,) - zI)-^Y}/^Z, - tr(sV2(S(^.,) _ ,/)-isV2)|2'= 

(3.15) < Cw" ■ E{\\J:'/\S(^,,e) - ziy'^'/^'"') 



CkP-Ewm^'' cp^p 



kJlSk 






2k 



where in the last line we used Lemma 5 and assumption (A.vii). Hence, for 
any e < 1/2 — 5, choosing k>2>/{l — 25 — 2e) and using Borel-Cantelli again, 
we get 



(3.16) 



max maxn 

£=l,...,n j^l 



ZjsV2(5(^.,^_,j)-iSi/2z, 



P 



tr(sV2(5(^.^)_^/)-iSi 



/2l 



P 



0. 
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Furthermore, by Lemma 2 and assumption (A.vii), recall that ran{z) 
tr(sV2(5_2/)-iSi/2)/p^ 



(3.17) 



max max 

;=l,...,n j^l 



-tr(Si/2(5(,-,)-z/)-^Si/2)-m„(z) 



||S|| C/ 



pv pv 
The convergence (3.14) follows. D 

We now continue the proof of the theorem. Recall that r^ = S^'^Z^, and Z^ 
consists of i.i.d. random variables with finite moments of all orders. By Lem- 
ma 4(ii) and (3.6), 



max 

i=l n 



rf(5(,)-z/)-i(A^nS+/)-ir,-rf(5(,)-z/)-i(M(,)S+/)-ir,| 



P 



(3.18) < max 



l....,n 



|M(,)-M„(z)|-|r^p-||(5(,)-z/)'i-||S| 
p 



|M(,)-M„(z)|-C/ |r,|2 
< max — — ■- — ■ ^0, 

£=l,...,n V p 

where in the last line we used Lemma 5, assumption (A.vii), the assumption 
that 6 < 1/6 (and hence 6<l/2- 6) and (3.11), and (3.3). 

Furthermore, similar to (3.15), by Lemma 9 and the independence be- 
tween Zi and Si/2(5(^) - zI)-i(M(^)S + I)-^^^/^, for any keN, 

E\Z^Y}''^{Si^i) - z/)"^(M(^)S + /)"^Si/22^ 

- tr(Si/2(5(,) - z/)-i(M(,)S + /)-isi/2)|2^ 



^ Ckp" ■ Ewm^" ^ cp^p 



kJISk 



.,2k 



2k 



where in the last line we use Lemmas 5, 3 and (3.6), and assumption (A.vii). 
Hence, choosing k > 2/(1 — 25) and using Borel-Cantelli again, we get 



max 



(3.19) 



ZjSi/2(5(^^_^/)-i(M(,)S + /)-iSi/2z, 



P 



tr(Si/2(5(^) _ z/)-i(M(,)S + /)-iSi 



/2l 



0. 
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Furthermore, by Lemmas 4(i), 3 and (3.6), the assumption that 5 < 1/6 (and 
hence 25 < 1/2 — 6) and (3.11), and assumption (A.vii), 



max 

?=l,...,n 



-tr(Si/2(5(,)-z/)-i(M(,)S + /)-isV2) 
--tr(sV2(5(^^_^j)-i(M„S + I)-^Si/2) 



(3.20) 



< max |M(^)-M„|-||(5(^)-z/) 
e=l,...,n 

< max |M(£) -Mn\ 



-111 _ ||y;||2 



= 1,...,?! ' ' V 

Finally, similar to (3.17), by Lemmas 2 and 3, and assumption (A.vii) 
- tr(sV2(5(^^ _ ziy'iMn^ + I)-isV2) 



max 

i=l,....n 



(3.21) - -tT{^^/^{S - zI)-\Mn^ + I)-^^^/^) 

^ ||(M„£ + /)-i-||S|| ^Cp^ ^^^ 
~ pf ~ pv 

Combining (3.18), (3.19), (3.20) and (3.21), we see that (3.10), and hen- 
ce (3.7) holds. 

Now we are ready to show that {m„(z)} admits at most one limit. 

Claim 2. Suppose that m,„j.(z) converges to m{z), then 

(3.22) M„,(.)^MW:=-i|'.^^^|^d,^0, 

where m{z) is the unique solution in Qi = {z £ C:Re(z) > 0,lm(z) > 0} to 
the following equation: 

/■I 1 

(3.23) / ^-- — ds = l-y(l + zm(z)). 



Proof. Writing 



S — zl + zl = 2^ wirir^ , 



£=1 

right-multiplying both sides by {S — zl)^^ and using (3.8) we get 



/ + ^(5_^/)-i = ^___ 
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Taking trace and dividing by n we get 

yn + zynmn{z) = 1 



-E- 

nf^ 1 



^ 1 + W£rj(5(£) -z/)-ir/ 

where, recall that, mn{z) = tv{[S — zl)~^)/p is the Stieltjes transform of F . 
Hence, if mn^{z) — )• m{z), then 



-1 '"K 



(3.24) 



^ 1 + Vn^nkWi ■ rf (5'(£) - z/) ^r^/pfc 
1 



"fc^ 1 



= 1 - y„^(l + 2:m.„^,(2)) -^ 1 - ?/(l + ^^-m-C^:)). 
However, by the same arguments for (3.16) and (3.17) we have 
rj(5(,) - ziy^v, tr(Si/2(5(^) _ ^/)-isV2 



(3.25) 

and 

(3.26) 



max 

1=1,.. ..n 



P 



P 



max 

£=l,...,n 



tr(EV2(5^^^_^j)-iSi/2) 



TUniz) 



where, recall that fhn{z) = tr(S^/^(5 — 2;J)~^E^/^)/p, which belongs to Qi by 
Lemma 7. Then by (3.24), assumption (A.v) and Lemma 8, m^j. (z) must also 
converge, and the limit, denoted by m{z) G Qi, must be the unique solution 
in Qi to the equation (3.23). Now by (3.12), (3.13) and assumption (A.v), 
we get the convergence for Mn^.{z) in the claim. That M{z) ^ follows from 
the expression and that fh{z) £ Qi. D 

We now continue the proof of the theorem. By the convergence of F^p 
to H and the previous claim. 



tr((-zM„,(z)S-z/)"i) 1 



P 
But (3.7) implies that 

(3.27) m{z) 



1 



Z Jri 



,,, , -dHir). 

z ./^gK rM{z) + 1 ^ ^ 

dH{T). 



1 



,gK rM{z) + 1 

Observing that M[z) ^ 0, Re(M(2;)) > 0, and H is not a delta measure 
at 0, we obtain that |?ti(z)| < l/\z\. Hence 1 + zm{z) / 0, and by (3.23), 
m{z) 7^ 0. Based on this, we can get another expression for M{z), as follows. 
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By (3.22), we have 

1 f^ w.. 



M{z) = — / ' , ds 

z Jq l + ym{z)Ws 



11//" 1 

1- / z T—-. ds 



z ym{z) V ^0 l + ym{z)w, 

= --■ -^ • (1 - (1 - y(l + zm{zm 
z ym{z) 

1 1 + zm{z) 

z fh{z) 

where in the third hne we used the definition (3.23) of fh{z). 
We can then derive another formula for fh{z). By (3.27), 

^ ' y^eK rM{z) + 1 ^ ' ^ ' y,gK rMiz) + 1 ^ ^ 

by using that H is a probabihty distribution. Dividing both sides by 
—z'm{z){^0) and using (3.28) yield 

M( ^_ (.l + zm{z)) _ M{z)f^^^T/{TM{z) + l)dH{T) 

M yzj — ^ , . — ~/\ ' 

zm[z) zm{z) 

and hence since M(z) ^0, 

(3.29) fh{z) = -- [ — -^^- rdHir). 

Observe that by Lemma 7 and (3.6), for any n, both m„(z) and Mn{z) 
belong to Qi, hence so do m{z) and M{z). We proceed to show that for 
those z = iv with v sufficiently large, there is at most one triple (m(z), M(z), 
m{z)) eQi X Qi X Qi that solves the equations (3.27), (3.22) and (3.29). 
In fact, if there are two different triples (mj(z),Mj(z),mj(z)),i = 1,2, both 
satisfying (3.27), (3.22) and (3.29). Then necessarily, Mi(z)/M2(z) and 
mi{z) / m2{z). Now by (3.22), 

Mi{z)-M2{z) = —-y{m2{z)-mi{z)) / ,, ^ ~ , . \% ^ ~ i \ \ ^^ 
by (3.29), 

"-^(-)-^-(-) = 4(M.(.)-M,(.)) £^ ^^^^^^^ ^ ;'^^^^^^^ ^ ^^ dHir). 
Therefore, 

1 = 4 / \. ^ , "- ^ , . , ds 



z^ Jo {l + ymi{z)ws){l+ym2iz)ws) 
i3.3Uj ^2 

"" X^K (rMi(z) + l)(rM2(z) + l) "^^^^^^ 
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However, since {Mi{z),mi{z)) G Qi x Qi, i = 1,2, 

< / w'^ ds < oo 
Jo 



I 



1 w^ 

as 



/o {l + ymi{z)ws){l + ym2iz)w, 
and 

Hence, for z = iv with u sufficiently large, (3.30) cannot be true. 

It remains to verify (3.1), that is, lim^^oo^^ • m{iv) = —1. In fact, us- 
ing (3.27) we get that 

(3.31) ivm{iv) = -f \ dHir). 

Since Re{M{iv)) > 0, \1/{1 + TM{iv))\ < 1 for all r > 0. Moreover, by (3.22) 
and that Re(m(z)) > 0, \M{iv)\ < l/v • j^ Wg ds, hence by the dominated con- 
vergence theorem, the right-hand side of (3.31) converges to — 1 as f — t- cxo. D 

3.4. Proof of Theorem 2. The TVARCV matrix has the form of weighted 
sample covariance matrices as studied in Theorem 1; however, assump- 
tion (A.vi) therein is not satisfied, and we need another proof. 

Theorem 2 is a direct consequence of the following two convergence re- 
sults. 

Proposition 7. Under assumption (Civ), namely, suppose that 

lim tv(T,p)/p = 6, 

then, almost surely, liTap^^tT{TS-^^)/p = 6. 

The proof is given in the supplemental article [Zheng and Li (2011)]. 
Next, recall that Sp and Sp are defined by (2.12) and (2.11), respectively. 

Proposition 8. Under the assumptions of Theorem 2, both F^p and F^p 
converge almost surely. F p converges to H defined by 

(3.32) H{x) = H{ex) for all x > 0. 

The LSD F of Tip is determined by H in that its Stieltjes transform, mp{z) 
satisfies the equation 



tGII 



T(i-y(i + ^;mp(z))) - z 



This can be proved in very much the same way as Theorem 1, by working 
with Stieltjes transforms. However, a much simpler and transparent proof is 
as follows. 
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Proof of Proposition 8. The convergence of F^p is obvious since 
F^p (x) = F^p (tr(Sp)/p • x) for all x>0. 

We now show the convergence of F^p . As in the proof of Theorem 1, for 
notational ease, we shall sometimes omit the superscript p in the arguments 
below: thus, we write fi^ instead of /Xj , jt instead of 7) , A instead of A'^^^ 
etc. 

First, note that 



AXe= ' fitdt + A- ' jtdWt:=J ' 7^^* • (v^ + A • Z^ 
where 

v,= | : I = !""-- ^* and Z, = | : ] = ^Zz^-i^^, 

dt 



(.)i Tis:^ U(-)j a/h^ 



By performing an orthogonal transformation if necessary, without loss of 
generality, we may assume that the index set Ip C {1,. . . ,??p}. Then by as- 
sumptions (C.ii) and (C.ix), for j > ?/p, Z^ are i.i.d. A^(0,1). Write \Je = 

(Zf \ . . . , Zf'^'y and D^ = {Z^''''^^\ ..., Z^^^f. With the above notation, Sp 
can be rewritten as 

,000, V AAXKAX^ ^ (v, + AZ,)(v,+AZ,)^ 
(3.33) S, = y„>^ =y„,^ |v, + AZ,P " 

By assumptions (C.i), (C.ii) and (C.ix), there exists C > such that 
\v£ \ < C I \/n for all j and £, hence |v^|'s are uniformly bounded. We will 
show that 

max NAZ/'I /p — ll 
£=!,.. .,n" ' ' ' 

(3.34) 

= max IZ^ SpZ^/p — 1| — ;• almost surely, 

i=\,...,n 

which clearly implies that 

(3.35) max ||v£-|-AZ^| /p— 1|— )-0 almost surely. 

l=\,....n 



To prove (3.34), write 



i-^n — 1 -rjT 



A B 
B^ C 
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where A, B and C are rjp x rjp, r]pX (n — rjp) and (n — rjp) x (n — rjp) matrices, 
respectively. Then 

ZjtpZ^ = UjAUi + 2BjB^\Ji + BjCDi. 

By a well-known fact about the spectral norm, 

p||<||i]p||, ||B|| < ||Sp|| and ||C|| < ||Sp||. 

In particular, by assumptions (C.ii), (C.vi) and (C.vii), 

< tv{A) < rip • ||Sp|| < Cp^'+^^ = o{p), 

hence tr(C)/p = (tr(Sp) — tr{A))/p — ^ 1. Now using the fact that D^ consists 
of i.i.d. standard normals and by the same proof as that for (3.4) we get 

(3.36) max |D^ CD^/p — 1| — )• almost surely. 

^=1,...,?! 

To complete the proof of (3.34), it then suffices to show that 

\\jjAVe\ , \T>jB^Ve\ 
max > and max > almost surely. 

£=l,...,n p e=l,...,n p 

We shall only prove the first convergence; the second one can be proved 
similarly. We have 

(3.37) l^jAVil < \\A\\ ■ |U^|2 < Cg/^ • |U^|2. 
Observe that for all 1 < i < i]p, by assumption (C.ii), 






I ,7(i)|2 _ I Jt„,«-i '^ ^' ^ (-^2 
\^« — 1:^ — -. TT-. "^ 



Cll^''^* "^^".^ 






By the Burkholder-Davis-Gundy inequality, we then get that for any /c G N, 
there exists Afc > such that 

(3.38) E\zf\^''<XkCf'. 

Now we are ready to show that max^=i^,,, ,i|U^AU£|/p— > 0. In fact, for any 
e > 0, for any /c S N, by Markov's inequality, (3.37), Holder's inequality 
and (3.38), 

n 

p( max \VjAlJe\>pe] <'^P{\ljJ AlJe\>pe) 
- '■■■>"■ ^^1 

ElVjAVA'' 



< 



E 



pkgk 



^ " CIp^^^ • [{np . A,Cf ) • ri^p-^] 



pkgk 



<Cp 



l+kSi+k&i-k 
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By assumption (C.vii), 5i + 82 < 1/2 < 1, hence by choosing k to be large 
enough, the right hand side will be summable in p, hence by Borel-Cantelh, 
almost surely, max£=i^...^„|Ujj4U^|/p — )• 0. 

We now get back to Sp as in (3.33). By (3.35), for any e > 0, almost 
surely, for all n sufficiently large, for all ^ = 1, . . . , n, 

p{l-e)<\yre + AZe\^<p{l + e). 

Hence, almost surely, for all n sufficiently large, 

1 ? /?^ ^(v, + AZ,)(v, + AZ 



^ ^^ _ A(v^+AZ^)(v^+AZ^ 1 ~ 



1 + e ''- ^ ""^ v^ + AZ 

where Sp = 1/n ■ I]i<£<n('^^ + ^'^i)i'^e + ^'^tf ■ Hence, by Weyl's Mono- 
tonicity theorem, for any 2; > 0, 

(3.39) F^p((l + e)x) > F^^{x) > F^^{{1 - e)x). 

Next, by Lemma 1, Sp has the same LSD as Sp := 1/n X]i<£<n ^'^(.{'^lY' ^ ■ 
Moreover, by using the same trick as in the beginning of the proof of Theo- 
rem 1, F^p has the same limit as F^p, where Sp = l/n^^<^<^ AZ£(Z£)^A^, 

and Z^ consists of i.i.d. standard normals. For F p, it follows easily from 
Proposition 1 that it converges to F. Moreover, by Theorems 1.1 and 2.1 in 
Silverstein and Choi (1995), F is differentiable and in particular continuous 
at all x > 0. It follows from (3.39) that F^p must also converge to F . D 

4. Simulation studies. In this section, we present some simulation stud- 
ies to illustrate the behavior of ESDs of RCV and TVARCV matrices. In 
particular, we show that the ESDs of RCV matrices that have the same 
targeting ICV matrix Sp can be quite different from each other, depending 
on the time variability of the covolatility process. Our proposed estimator, 
the TVARCV matrix Sp, in contrast, has a very stable ESD. 

We use in particular a reference curve which is the Marcenko-Pastur law. 
The reason we compare the ESDs of RCV and TVARCV matrices with the 
Marcenko-Pastur law is that the Marcenko-Pastur law is the LSD of S^ 
defined in (1.6), which is the RCV matrix estimated from sample paths of 
constant volatility that has the same targeting ICV matrix as Sp . As we 
will see soon in the following two subsections, when the covolatility process 
is time varying, the ESD of RCV matrix can be very different from the 
Marcenko-Pastur law, while the ESD of TVARCV matrix always matches 
the Marcenko-Pastur law very well. 

In the simulation below, we assume that A = /, or in other words, Xj sat- 
isfies (2.1) with 7i a deterministic (scalar) process, and W^ a ]5-dimensional 
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y> f 




/J 
J f 


RCV 


— — TVARCV 








Fig. 1. Le/i -panel: p = 100, n = 1,000; right panel: p = 2,000, n = 1,000. 

standard Brownian motion. The observation times are taken to be equidis- 
tant: Tn/ = i/n,i = 0,1,. .. ,n. 

We present simulation results of two different designs: one when 74 is 
piecewise constant, the other when 7^ is continuous (and non-constant). 
In both cases, we compare the ESDs of the RCV and TVARCV matrices. 
Results for different dimension p and observation frequency n are reported. 

In all the figures below, we use red solid lines to represent the LSDs 
of S " given by the Marcenko-Pastur law, black dashed line to represent 
the ESDs of RCV matrices, blue bold longdashed line to represent the ESDs 
of TVARCV matrices. 



4.1. Design I, piecewise constants. We first consider the case when the 
volatility path follows piecewise constants. More specifically, we take jt to 
be 



It 



y O.OOO? , t G [0, 1/4) U [3/4, 1] 
VO.OOOl, tG [1/4,3/4). 



In Figure 1, we compare the ESDs of RCV and TVARCV matrices for 
different pairs of p and n, with the LSD of J] " given by the Marcenko- 
Pastur law as reference. 

We see from Figure 1 that: 

• the ESDs of RCV matrices are very different from the LSD given by the 
Marcenko-Pastur law (the LSD of S^^^°); 

• the ESDs of TVARCV matrices follow the LSD given by the Marcenko- 
Pastur law very well, for both pairs of p and n, even when p is small 
compared with n. 

In fact, the dependence of the ESD of RCV matrix on the time variability 
of covolatility process can be seen more clearly from Figure 2, where we 
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1 1 1 


(7,1) 

(6,2) 

- - (5,3) 

MP 

1 



Fig. 2. Comparisons, different values of piecewise constants (a,b) as shown in the leg- 
end, which are such that the targeting ICV matrix is the same; the red solid curve is the 
Marcenko-Pastur law (the LSD of E^*^^ ). p and n are both taken to he 1,000. Left panel: 
RCV; right panel: TVARCV. 



It 



where a + 6 = 8. 



consider the same design but different values for "ft'- 
gi/2 X ^Q^2 ^ t G [0, 1/4) U [3/4, 1] , 
61/2x10^2^ tG [1/4,3/4), 

We plot the ESDs of RCV and TVARCV matrices for the case when p = 
n = 1,000, in the left and right panel, respectively. The curves' corresponding 
parameters (a, b) are reported in the legend. Note that since all pairs of (a, b) 
have the same summation, in all cases the targeting ICV matrices are the 
same. 

We see clearly from Figure 2 that, the ESDs of RCV matrices can be very 
different from each other even though the RCV matrices are estimating 
the same ICV matrix; while for TVARCV matrices, the ESDs are almost 
identical. 



4.2. Design II, continuous paths. We illustrate in this subsection the 
case when the volatility processes have continuous sample paths. In partic- 
ular, we assume that X^ satisfies (2.1) with 



It 



^0.0009 + 0.0008 cos(27rt), tG [0,1]. 



We see from Figure 3 similar phenomena as in Design I about the ESDs 
of RCV and TVARCV matrices for different pairs of p and n. 

5. Conclusion and discussions. We have shown theoretically and via sim- 
ulation studies that: 

• the limiting spectral distribution (LSD) of RCV matrix depends not only 
on that of the ICV matrix, but also on the time-variability of covolatility 
process; 
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RCV 


— — TVARCV 








Fig. 3. Le/i panel: p = 100, n = 1,000; right panel: p = 2,000, n = 1,000. 



• in particular, even with the same targeting ICV matrix, the empirical 
spectral distribution (ESD) of RCV matrix can vary a lot, depending on 
how the underlying covolatility process evolves over time; 

• for a class C of processes, our proposed estimator, the time-variation ad- 
justed realized covariance (TVARCV) matrix, possesses the following de- 
sirable properties as an estimator of the ICV matrix: as long as the tar- 
geting ICV matrix is the same, the ESDs of TVARCV matrices estimated 
from processes with different covolatility paths will be close to each other, 
sharing a unique limit; moreover, the LSD of TVARCV matrix is related 
to that of the targeting ICV matrix through the same Marcenko-Pastur 
equation as in the sample covariance matrix case. 

Furthermore, we establish a Marcenko-Pastur type theorem for weighted 
sample covariance matrices. For a class C of processes, we also establish 
a Marcenko-Pastur type theorem for RCV matrices, which explicitly demon- 
strates how the time-variability of the covolatility process affects the LSD 
of RCV matrix. 

In practice, for given p and n, based on the (observable) ESD of TVARCV 
matrix, one can use existing algorithms to obtain an estimate of the ESD 
of ICV matrix, which can then be applied to further applications such as 
portfolio allocation, risk management, etc. 

Acknowledgments. We are very grateful to the Editor, the Associate Edi- 
tor and anonymous referees for their very valuable comments and suggestions. 

SUPPLEMENTARY MATERIAL 

Supplement to "On the estimation of integrated covariance matrices 
of high dimensional diffusion processes" (DOI: 10.1214/11-AOS939SUPP; 
.pdf). This material contains the proof of Proposition 4, a detailed expla- 
nation of the second statement in Remark 3, and the proofs of the various 
lemmas in Section 3.1 and Proposition 7. 



HIGH DIMENSIONAL INTEGRATED COVARIANCE MATRICES 31 

REFERENCES 

Admati, a. R. and Pfleiderer, P. (1988). A theory of intraday patterns: Volume and 
price variability. Rev. Fmanc. Stud. 1 3-40. 

Andersen, T. G. and Bollerslev, T. (1997). Intraday periodicity and volatility persis- 
tence in financial markets. Journal of Empirical Finance 4 115-158. 

Andersen, T. G. and Bollerslev, T. (1998). Deutsche mark-dollar volatility: Intra- 
day activity patterns, macroeconomic announcements, and longer run dependencies. 
J. Finance 53 219-265. 

Andersen, T. G., Bollerslev, T., Diebold, F. X. and Labys, P. (2001). The distribu- 
tion of realized exchange rate volatility. J. Amer. Statist. Assoc. 96 42-55. MR1952727 

Bai, Z. D. (1999). Methodologies in spectral analysis of large-dimensional random matri- 
ces, a review. Statist. Smica 9 611-677. MR1711663 

Bai, Z., Chen, J. and Yao, J. (2010). On estimation of the population spectral distribu- 
tion from a high-dimensional sample covariance matrix. Aust. N. Z. J. Stat. 52 423-437. 
MR2791528 

Bai, Z., Liu, H. and Wong, W.-K. (2009). Enhancement of the applicability of 
Markowitz's portfolio optimization by utilizing random matrix theory. Math. Finance 
19 639-667. MR2583523 

Bai, Z. D. and Silverstein, ,]. W. (1998). No eigenvalues outside the support of the 
limiting spectral distribution of large-dimensional sample covariance matrices. Ann. 
Probab. 26 316-345. MR1617051 

Barndorff-Nielsen, O. E. and Shephard, N. (2004). Econometric analysis of reahzed 
covariation: High frequency based covariance, regression, and correlation in financial 
economics. Econometnca 72 885-925. MR2051439 

Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A. and Shephard, N. (2011). Mul- 
tivariate realised kernels: Consistent positive semi-definite estimators of the covariation 
of equity prices with noise and non-synchronous trading. J. Econometrics 162 149-169. 

El Karoui, N. (2008). Spectrum estimation for large dimensional covariance matrices 
using random matrix theory. Ann. Statist. 36 2757-2790. MR2485012 

Fan, J., Li, Y. and Yu, K. (2011). Vast volatility matrix estimation using high frequency 
data for portfolio selection. J. Amer. Statist. Assoc. To appear. 

Geronimo, J. S. and Hill, T. P. (2003). Necessary and sufficient condition that the limit 
of Stieltjes transforms is a Stieltjes transform. J. Approx. Theory 121 54-60. MR1962995 

Horn, R. A. and Johnson, C. R. (1990). Matrix Analysis. Cambridge Univ. Press, 
Cambridge. Corrected reprint of the 1985 original. MR1084815 

Jacod, J. and Protter, P. (1998). Asymptotic error distributions for the Euler method 
for stochastic differential equations. Ann. Probab. 26 267-307. MR1617049 

Marcenko, V. A. and Pastur, L. A. (1967). Distribution of eigenvalues in certain sets 
of random matrices. Mat. Sb. (N.S.) 72 507-536. MR0208649 

Markowitz, H. (1952). Portfolio selection. J. Finance 7 77-91. 

Markowitz, H. M. (1959). Portfolio Selection: Efficient Diversification of Investments. 
Cowles Foundation for Research m Economics at Yale University, Monograph 16. Wiley, 
New York. MR0103768 

Mestre, X. (2008). Improved estimation of eigenvalues and eigenvectors of covariance 
matrices using their sample estimates. IEEE Trans. Inform. Theory 54 5113-5129. 
MR2589886 

Silverstein, J. W. (1995). Strong convergence of the empirical distribution of eigenvalues 
of large-dimensional random matrices. J. Multivariate Anal. 55 331-339. MR1370408 



32 X. ZHENG AND Y. LI 

SiLVERSTEiN, J. W. and Bai, Z. D. (1995). On the empirical distribution of eigenvalues 
of a class of large-dimensional random matrices. J. Multivariate Anal. 54 175-192. 
MR1345534 

SiLVERSTEiN, J. W. and Choi, S.-I. (1995). Analysis of the limiting spectral distribution 
of large-dimensional random matrices. J. Multivariate Anal. 54 295-309. MR1345541 

Tao, M., Wang, Y., Yao, Y. and Zou, J. (2011). Large volatility matrix inference via 
combining low-frequency and high-frequency approaches. J. Amer. Statist. Assoc. 106 
1025-1040. 

Wang, Y. and Zou, J. (2010). Vast volatility matrix estimation for high-frequency finan- 
cial data. Ann. Statist. 38 943-978. MR2604708 

Yin, Y. Q. (1986). Limiting spectral distribution for a class of random matrices. J. Mul- 
tivariate Anal. 20 50-68. MR0862241 

Zheng, X. and Ll, Y. (2011). Supplement to "On the estimation of integrated covariance 
matrices of high dimensional diffusion processes." DOI:10.1214/11-AOS939SUPP. 

Department of Information Systems 

Business Statistics and Operations Management 

Hong Kong University of Science 
and Technology 

Clear Water Bay, Kowloon 

Hong Kong 

E-mail: xhzheng@ust.hk 
yyli(9ust.hk 



